Geometric convergence in average reward Markov decision processes

نویسنده

  • W. H. M.
چکیده

• A submitted manuscript is the author's version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extended Geometric Processes: Semiparametric Estimation and Application to ReliabilityImperfect repair, Markov renewal equation, replacement policy

Lam (2007) introduces a generalization of renewal processes named Geometric processes, where inter-arrival times are independent and identically distributed up to a multiplicative scale parameter, in a geometric fashion. We here envision a more general scaling, not necessar- ily geometric. The corresponding counting process is named Extended Geometric Process (EGP). Semiparametric estimates are...

متن کامل

Simulation-based optimization of Markov decision processes: An empirical process theory approach

We generalize and build on the PAC Learning framework for Markov Decision Processes developed in Jain and Varaiya (2006). We consider the reward function to depend on both the state and the action. Both the state and action spaces can potentially be countably infinite. We obtain an estimate for the value function of a Markov decision process, which assigns to each policy its expected discounted...

متن کامل

Convergence of Simulation-Based Policy Iteration

Simulation-based policy iteration (SBPI) is a modification of the policy iteration algorithm for computing optimal policies for Markov decision processes. At each iteration, rather than solving the average evaluation equations, SBPI employs simulation to estimate a solution to these equations. For recurrent average-reward Markov decision processes with finite state and action spaces, we provide...

متن کامل

Optimal Nudging: Solving Average-Reward Semi-Markov Decision Processes as a Minimal Sequence of Cumulative Tasks

This paper describes a novel method to solve average-reward semi-Markov decision processes, by reducing them to a minimal sequence of cumulative reward problems. The usual solution methods for this type of problems update the gain (optimal average reward) immediately after observing the result of taking an action. The alternative introduced, optimal nudging, relies instead on setting the gain t...

متن کامل

Learning in Average Reward Stochastic Games A Reinforcement Learning (Nash-R) Algorithm for Average Reward Irreducible Stochastic Games

A large class of sequential decision making problems under uncertainty with multiple competing decision makers can be modeled as stochastic games. It can be considered that the stochastic games are multiplayer extensions of Markov decision processes (MDPs). In this paper, we develop a reinforcement learning algorithm to obtain average reward equilibrium for irreducible stochastic games. In our ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017